SUB-GAUSSIAN TAIL BOUNDS FOR THE WIDTH AND 
HEIGHT OF CONDITIONED GALTON WATSON TREES. 
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Abstract. We study the height and width of a Galton-Watson tree 
with offspring distribution £ satisfying E£ = 1, < Var£ < oo, condi- 
tioned on having exactly n nodes. Under this conditioning, we derive 
sub-Gaussian tail bounds for both the width (largest number of nodes 
in any level) and height (greatest level containing a node); the bounds 
are optimal up to constant factors in the exponent. Under the same 
conditioning, we also derive essentially optimal upper tail bounds for 
the number of nodes at level k, for 1 < k < n. 



1. Introduction 

A Galton-Watson tree is the family tree of a Galton-Watson process, 
i.e., it is a random rooted tree, constructed recursively from the root, where 
each node has a random number of children and these random numbers are 
independent copies of some random variable £ taking values in {0, 1, . . . }. 
We let T denote a (random) Galton-Watson tree. (T depends of course on £, 
or rather its distribution, but the offspring distribution £ is fixed throughout 
the paper and is therefore not shown explicitly in the notation.) We view 
the children of each node as arriving in some random order, so that T is an 
ordered, or 'plane tree. 

At times in the paper it will be useful to think of T as a subtree of 
the so-called Ulam-H arris tree U: this is the tree with root whose non- 
root nodes correspond to finite sequences of integers v± . . .Vk, with v\ . . .Vk 
having parent V\. . . Vk-i and children {v± . . . v^i : i £ {1,2,.. .}}. For a 
node v of Li we think of vi as the i'th child of v. Any rooted plane tree T 
in which all nodes have at most countably many children can be viewed as 
a subtree of hi by sending the root of T to the root of Li and using the 
ordering of children in T to recursively define an embedding of T into li (see 
e.g. @). 

We will study the conditioned Galton-Watson tree 7~ n , which is the ran- 
dom tree T conditioned on having exactly n nodes. In symbols, T n '■= {T \ 
|T| = n), where, for any tree T, \T\ denotes its number of nodes. (We 
consider in the sequel only n such that P(|7"| = n) > 0.) For examples 
of standard types of random trees that can be represented as conditioned 
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Galton- Watson trees for suitable £, see e.g. Devroye [7(. The conditioned 
Galton-Watson trees are essentially the same as the random simply gener- 
ated trees [l^j, see e.g. 0] or [§]. 

As is well-known, the distribution of the tree T n is not changed if £ is 
replaced by another random variable £' whose distribution is replaced by 
tilting (or conjugation) [3]: 1P(£' = k) = ca k F(( t = k), k > 0, for some 
a > and normalizing constant c. (Necessarily, c = (Ea') , and thus 
Ea« < oo.) We may, except in some exceptional cases, by a suitable tilting 
assume that E£ = 1, so that the branching process is critical. This turns 
out to be convenient, and we will in the sequel always make this assumption 
E£ = 1. We further assume that £ has finite variance a 2 := Var£ < oo. We 
exclude the trivial case £ = 1 a.s., i.e., we assume a 2 > 0. (Equivalently, 
when E£ = 1, P(f = 0) > 0.) 

For a rooted tree T (deterministic or random), the depth h(v) of a node 
v is its distance to the root; the root thus has depth 0. Let Z^iT) be the 
width at level k, i.e., the number of nodes at depth k, k = 0, 1, ... . We 
define, as usual, the width of the tree by 

W = W(T) := max Z k (T), (1.1) 
fc>0 

and the height by 

H = H(T) := max{h(v) : v G T} = max{/c : Z k (T) > 0}. (1.2) 

It is well-known that the width and height of a conditioned Galton- 
Watson tree T n both are of the order y/n. More precisely, n~ l l 2 W{T n ) 
and n~ l l 2 H(J~ n ) both converge in distribution, as n — > oo, see e.g. (H, (H, 
E3 and @; moreover, they converge jointly [1], flil ]. 

[n' l / 2 W (T n ), n-^HiTn)) A (aW, a^H) (1.3) 

for some limit variables W and H, that furthermore do not depend on 
the distribution of £. (W is the maximum of a Brownian excursion, and 

H = 2W; see further 0.) 

Two of the main results of the paper are to prove essentially optimal 
uniform sub-Gaussian upper tail bounds for both W(T n ) / \/n and H(T n ) / \fn 
for every offspring distribution £ with finite variance. As an immediate 
consequence, the estimates EW{T n ) = 0{n l l 2 ) and EH(T n ) = 0{n 1 / 2 ) 
hold; even these much weaker statements are to our knowledge new at this 
level of generality. (For estimates assuming an exponential moment of £, see 
e.g. 0.) 

We let C\ , C2, . . . , ci , C2 , . . . denote positive constants that may depend 
on the distribution of £ (and in particular on a 2 ) but not on n or other 
parameters unless explicitly indicated. (We use Cj for "large" and c, for 
"small" constants.) Proofs are given in Section HI 
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Theorem 1.1. Suppose that E£ = 1 and Var£ < oo. Then 

W{W(T n ) >x)< de~ cix2 l n 
for all x > and n > 1. 

Theorem 1.2. Suppose that E£ = 1 and < Var£ < oo. Then 

P(# (T„) >h)< C 2 e- C2h2/n (1.4) 

for all h > and re > 1. 

The condition Var£ > excludes the case P(£ = 1) = 1, in which case T n 
is a path of length n. 

Corollary 1.3. Suppose thatE£ = 1 and < Var£ < oo. Then EW(T n ) = 
Oin 1 / 2 ) andKH(T n ) = 0(n 1 / 2 ). More generally, for every fixed r < oo, 
E(W(T n ) r ) = 0(n r / 2 ) and E(H(T n ) r ) = 0{n r / 2 ). 

While our methods do not prove the convergence fjl .3|) of W(J~ n )/\/n and 
H(7~ n )/y/n, we have thus as a corollary obtained tightness of them, and we 
believe that our argument might be the simplest proof of this tightness. 

On the other hand, knowing the limit result (|1.3p . it follows from the 
fact that the bounds in Corollary 11.31 hold for every r that all moments 
(also joint) converge in (jl.3p . In particular, by the known formulas for the 

moments of W and H = 2W (see e.g. as n — > oo, 

K(W(Tn) r ) /n r/2 -> a r E W r = o r T T l 2 r(r - l)r(r/2)C(r), (1.5) 
E(H{T n ) r )/n r / 2 -> a" r Ei7 r = cr- r 2 r / 2 r(r - l)r(r/2)C(r). (1.6) 

For joint moments, see @j and These results are well-known if £ is 



assumed to have an exponential moment, see e.g. [14| and [ll[, but to our 
knowledge they have not, even in the case r = 1, been proved before without 
extra conditions. 

We emphasise that we obtain these bounds for higher moments of both 
W(7~ n ) and H(7~ n ), and even sub-Gaussian tail bounds for both variables, 
without assuming more than a finite second moment of £. This is somewhat 
surprising, at least for the width, since a £ with a large tail will produce 
a very wide Galton-Watson tree T with comparatively large probability; 
the explanation is that if the tree has one generation that is very large, say 
of size m, then it will probably have many nodes (of order m 2 ) in later 
generations, so the conditioning on exactly n nodes makes this event very 
unlikely if m ^> y/n. In other words, the bounds on the width hold, not 
because it is difficult for the Galton-Watson tree to get many branches, but 
because it is difficult to get rid of them in time. 

Remark 1.4. We assume a 2 = Var£ < oo throughout the paper. Since 
increasing a makes the width larger and the height smaller (asymptotically 
at least), see e.g. (|1.5j) - (|1.6j) . it is not reasonable to expect that the results 
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for the width generalize to the case a 2 = oo. However, for the same rea- 
son it seems likely that the results for the height extend, but we have not 
investigated that and leave that as an open problem. In particular, we ask 
the following questions (assuming E£ = 1): Is EH(J~ n ) = 0(n 1//2 ) also if 
a 2 = oo? Is EH(T n ) = o{n 1 ' 2 ) if a 2 = oo? 

Next we consider the width Z k (T n ) at a given level k. Of course, Z k {T n ) < 
W(%i), so the results above for W(T n ) immediately imply the same bounds 
for Zk(7~ n ), uniformly in k. In particular, 

EZ k (T n ) = 0(n 1 / 2 ). (1.7) 

For k >c n 1 / 2 , this is the correct order of ~EZ k (T n ); in fact, n -1 ' 2 Z\ x ^\ (T n ) 
con verg es in distribution for every fixed x > 0, and as a function of x, see 
Tol . 11 1 (assuming a finite exponential moment) and [lit] (the general case, 



by probabilistic methods). 

For small k, on the other hand, Z k {Tn) is smaller and it was proven in 
[H, Theorem 1.13] that 

EZ k (T n ) = 0(k), (1.8) 
uniformly for all k > 1 and n > 1. This is the best possible estimate, since 
for any fixed k, 

EZ k (T n ) -> 1 + /co- 2 , asn^oo, (1.9) 

see Meir and Moon [26| and Janson (It is shown in [17| that the 

sequence E Z k {T n ) is not always monotone in n, so (|1.8p is not a consequence 
of (ITU.) 

Furthermore, for large fc, (|1.8p is again not sharp. Indeed, if S> y^; 
then typically H(T n ) < k and thus Z k {T n ) = 0. In fact, as k — > oo, E,Z k (T n ) 
decreases exponentially, as is shown by the next theorem, which combines 
the three phases (k <C y/n, k x n, k 3> -^/n) in a unified statement. (Drmota 
and Gittenberger [ll[ gave the weaker bound C3?i 1 / 2 e _C3fc//v/ ™, assuming an 
exponential moment on £.) 

Theorem 1.5. Suppose that E£ = 1 and < Var ^ < oo. For all n,k > 1, 

EZ k {T n ) <C 4 ke~ C4k2/n (1.10) 

and afao 

EZ fe (T„) < C 5 n 1/2 e- C5fc2 / n (1.11) 
(which is weaker for k = o(y/n) but equivalent for larger n). 

Turning to higher moments of Z k (J~ n ), we first note that for small k there 
is no result corresponding to fll . 101) without assuming higher moments of £. 
In fact, already for k = 1, it is easy to see that for any m > 1, 

W{Zi{T n ) = m) -> mP(f = m) 

as n — 7- oo, see [lii ] and Remark 13.11 It follows by Fatou's lemma, that if 
E£ r+1 = oo, for some r > 1, then EZ k (7~ n ) r — > oo. The same holds for 
KZk(T n ) r for every fixed k > 1. 
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Conversely, it was proven in 0, Theorem 1.13] that if E£ r+1 < oo for an 
integer r > 1, then 'EZk(T n ) r = 0{k r ) uniformly in k > 1 and n > 1. (The 
restriction to integer r is for technical reasons in the proof; we conjecture 
that the result holds for any real r > 1.) 

On the other hand, the estimate (jl.llj) extends to higher moments without 
assuming any moment condition on £ beyond our standing < Var£ < oo, 
i.e., E£ 2 < oo and £ is not constant. 

Theorem 1.6. Suppose that E£ = 1 and < Var£ < oo. For any r < oo, 
E(Z k (T n )/Vn~) r < C 6 (r)e- c * k2 / n (1.12) 

for all k,n> 1. 
Furthermore, 

F(Z k (T n ) > x) < c 7 e- C6k2/n - C7x2/n (1.13) 
for all x > and re > 1. 

1.1. Remarks on the limit law. We say that T is theta distributed if it 
has distribution function 



OO . K/o oo 



P(T < x) = £ (1 - 2jV) e^ 2 * 2 = ^ £ jV^' 2 /* 2 , x > 0. 

j=-oo j=l 

The appearance of T as the limit law of the height of random conditional 
Galton-Watson trees was noted in [H, i, @, [13, S, Hi]. Furthermore, the 
maximum of Brownian excursion of duration one is distributed as T/y/2 

(see, e.g., 0|). In (|T5jl . W = T/^/2 and if = T^/2. It takes a moment to 
verify that for x > 1, 

P(T > x) > 2e- x2 , (1.14) 

and for x < 1, 

P(T < x) > 40 e-* 3 /* 3 . (1.15) 

The bound of Theorem 11.11 combined with the limit result (|1.3p then shows 
that 

2 

ci < -~. 

Similarly, the bound of Theorem 11.21 combined with the limit result (|1.3p 
then shows that 

a 2 

c 2 < — . 
- 2 

It would be nice if c\ and c 2 could be be made more explicit. In any case, 
the sub-Gaussian tail behaviour of the bounds in Theorems 11.11 and 11.21 is 
optimal, modulo a constant factor (depending on £). 
We also have the trivial observation that 

W(T n )H(T n ) > n - 1. 
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Thus, Theorems 1 1 . 1 1 and 1 1 . 2 1 yield the following left-tail upper bounds: 
nW(T n ) < x) < P (h(Tu) > < C 2 exp ( C2(n " 2) 

and 



xl \ x 2 



nH(T n ) < x) < P ( W(T n ) > — < Ci exp ^ Cl( ™ 2) 



In view of (II. 3p and the remark (|1.15p about the theta distribution, these 
bounds are optimal up to the constant factors c\ and c 2 . 

2. Preliminaries 

The span of £, denoted span(£), is the largest integer d such that £/c£ a.s. 
is an integer. Note that P(|7~| = n) > 0, so T n exists, if and only if n = 1 
modulo span(£), except possibly for some small n. 

We let £j denote i.i.d. copies of the random variable ^, and let S n be the 
partial sums of £i, £2, • • • , 



5„:=^^. (2.1) 



i=l 



By a classic formula, see e.g. Dwass (12I ]. Kolchin [22I Lemma 2.1.3, p. 105] 
or Pitman |27| |. for n > 1, 

F(\T\=n) = -F(S n = n-l), (2.2) 
n 

and, more generally, for m, n > 1 and independent copies 71, ... , T m of T, 

m 

w(j2\T i \='nJ =^P(5„ = n-m). (2.3) 



j=i 



Together with the local central limit theorem, (|2.2|) implies [22j, Lemma 
2.1.4, p. 105], with d := span(£) (recall that we only consider n such that 
n = 1 (mod d)), 

P(|T| = n) ~ - 7 L- n~ 3 / 2 . (2.4) 



We will use a one-sided tail bound for S n , which we take from Janson 
that only requires our (weak) conditions. Note that, apart from the 
values of the constants, the bound in (|2.5p is exactly as the limit given by 
the local central limit theorem when it applies; hence, at least for m not too 
large, it is of the best possible kind. 

Lemma 2.1 (0, Lemma 2.1]). Suppose that ^ are i.i.d., non-negative and 
integer-valued random variables, with E£j = 1 and Var£j < oo ; and let 
S n := Yli=l Then, for all n > 1 and m > 0, 

P(5 n = n-m)< ^L e -^ 2 /n. (2.5) 
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Remark 2.2. We can write the probability in (I2.5D as P(^" =1 (l — £j) = m). 
The point is that even without any assumptions on the tail of £j beyond finite 
variance, the variables 1 — & are bounded above, which is enough for strong 
tail bounds for m > 0. (There is no similar bound for m < under our 
weak conditions.) Cf. the related tail bound P(S n < n — m) < Cge~ C8m < n , 
which follows by (|2.6|) below. 

We will use the following version of Bernstein's inequality, which is valid 



for variables with a one-sided bound, see e.g. [13, (2.9)-(2.13)] and |25l . 
Theorem 2.7]. 

Lemma 2.3. Let X±,X2, • • • , X n be independent random variables such that 
Xi — EXi <b for every i. Then, with V := X^i=i Var(Aj), 

3. A size-biased Galton- Watson tree 

Let £ be a random variable with the size-biased distribution 

P(( = m) = mP(? = ffl). (3.1) 

(Note that this is a probability distribution on {1, 2, ... } since E£ = 1, and 
that i > 1.) 

Let, for k > 1, T^ fc ) be the modified Galton- Watson tree defined as fol- 
lows: There are two types of nodes: normal and mutant. Normal nodes 
have offspring (outdegree) according to independent copies of £, while mu- 
tant nodes have offspring according to independent copies of £. Moreover, 
all children of a normal node are normal, while for each mutant node, one 
of its children is selected uniformly at random and called its heir; the heir 
is mutant if it has depth less than k but normal if the depth is at least k, 
and all other children are normal. (Alternatively, we can call the mutants 
kings, with a reproductive behaviour different from the common people. At 
time k, a republic is introduced, and everybody becomes equal.) 

There are thus exactly k mutant nodes, which together with the heir v* 
of the last mutant node form a path from the root to some node v* at depth 
k. We call this path the spine of T^. 

Remark 3.1. This construction with k = oo was introduced by Lyons, 



Pemantle and Peres [2J], and is called the size-biased Galton-Watson tree; 
in this case the spine is infinite so the tree is infinite. The underlying size- 
biased Galton-Watson process is the same as the Q-process studied in [3, 
Section 1.14]. For any fixed k, the first k generations of T n converge in 
distribution to the first generations of "T^°°\ 

Our f (fc) is a truncated version of this, which grows like a normal Galton- 
Watson tree after generation k; thus is a.s. finite. 
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An equivalent construction is to start with the spine, and attach indepen- 
dent copies of T to it; the number of such trees attached to each node in 
the spine except the last one (the top node) has distribution £ — 1, but the 
number attached to the top node is £. 

The probability that a given mutant node has m children and that a given 
one of them is selected as heir is, by (|3.1I) , 

— P(| = m) = P(£ = m), m>l. 
m 

It follows that for any rooted tree T, and any path 7 in T from the root to 
a node at depth k, letting di, d,2, ■ ■ ■ denote the outdegrees of the nodes in 
T, taken in breadth-first order, say, 

F(f (k) = T with 7 as spine) = JJp(f = d v ) = P(T = T). (3.2) 

V 

Since the possible spines in T are in one-to-one correspondence with the 
nodes at depth k, the number of them is Zk(T), and thus 

P(f( fc ) = T) = Z k (T) P(T = T). (3.3) 

In other words, has the distribution of T biased by Z k , the size of 
generation k. In particular, this yields, summing (|3.3p over all trees T of 
size \T\ = n, 

¥(\f {k) \=n)=E(Z k (T); \T\ = n) 

and thus 

E(Z k (T); \T\ = n) P(|fW|=n) 
^Z k (Tn) = p(m=n) = P(|Tl=n) ' (3 ' 4) 

4. Proofs 

Proof of Theorem Consider the breadth first search of the Galton- Watson 
tree. As is well known, this search keeps a queue of Qi nodes with Qq = 1 
and the recursion Qi = Qi-i — 1 + with £3 i.i.d. copies of ^ as above; 
hence Qj = 1 + Sj, where 5j := X^=i(& — 1) = "Sj — 3- The breadth first 
search stops, and the tree is completely explored, when Qj becomes 0; in 
order for the tree to have size n we thus have Qj > for < j < n and 
Q n = 0; equivalently, Sj > for j < n and S n = —1. 

When the breadth first search just has completed exploring the nodes at 
level k — 1, the queue consists of exactly the nodes at level k. Hence each 
Z k is some Qj, and 

W := maxZ k < max Q,-. 
fc>0 j>o J 

As a result, for the conditioned Galton- Watson tree 7~ n , 

P(W > x + 1) < P(maxQj > x + l) 
3 

= P(max 5j > x I 5j > 0, j < n, and 5 n = -l) . (4.1) 
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We get rid of the conditioning on Sj > (j < n) by the standard rotation 
argument: for each (deterministic) sequence X\,...,x n of integers > — 1 
with sum Y^i=i x i = ~~ 1' there is exactly one rotation xf' := xi +t with 
t £ {0, . . . , n — 1} and indices taken modulo n, such that the partial sums 
:= Si=i — for 1 < j < n. Hence, we can obtain (5j)j =1 with 
the conditional distribution given Sj > 0, j < n, and S n = — 1, as required 
in (|4.ip . by conditioning (Sj)™^ on 5 n = — 1 and then taking the unique 
correct rotation. The rotation may change maxj Sj, but we have 

max Sj = max Sj — min Sj + 1, 

j<n j<n j<n 

and the latter quantity is changed by at most 1 by a rotation of £j := £j — 1, 
i = 1, . . . , n. Hence, the rotation argument shows that 

PlmaxS*,' > x | Sj > 0, j < n, and 5 n = — 1 

\j<n 



< P ( max Sj — min 5* > x | = — 1 ) . 

\ j<n j<n ' 



By (|4.ip we thus have 



P(maxQ i > 2a; + 2) < PfmaxS,- - minS,- > 2x + 1 | 5 n = -l) 

< P(max5,- > x | 5„ = -l) + PfminS*,- < -x - 1 | S n = -l). 

_/'<n i<" 

Furthermore, the reflection £j <->■ £ n+ i_j, which takes Sj «->■ S n — S n _./, shows 
that the last probabilities are the same, and we thus have 

P(maxQ 3 - > 2x + 2) < 2P(max5,- > x I 5 n = -l). (4.2) 

j j<n 

Fix x > and let r be the stopping time min{j > : Sj > x}. Then 
(|4.2j) can be written 

P(maxQj > 2x + 2) < 2P(r < n \ S n = -l) 

j 

= 2P(5 n = -l|r<n)-P(r<n) 
P(5 n = -1) 

By definition, 5 r > x. Further, for any t < n and y > x, by Lemma |2.1| 

P(S n = -1 | r = t and S T = y) = F(S n - S t = -y - 1) 

= P(5 n _ t = -(j/ + 1)) < Cjn-^e-^y+V 2 '^ < ^rT^e^ > n . 
Consequently, 

F(S n = -1 | r < n) < C 7 n^ 2 e-~ C7x2/n , 
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and (USD yields 

(7q7? _1 / 2 P~ C7x2 / ?1 2/ 

P(maxQ 5 > 2x + 2) < 9 „ < C l0 e~ C9X ' n , (4.4) 

i 3 F(S n = -l) 

since P(S n = — 1) > cion. -1 / 2 by the standard local central limit theorem. 
Finally, since P(VF > 2x + 2) < P(maxj Qj > 2x + 2), the proof is complete. 

□ 

Proof of Theorem By choosing C2 sufficiently large we may assume that 
h > y/n. We may also assume that h is an integer. Our proof of (jl.4p is 
based on the following observation: if v is a node of T n with "large" height 
then either there are many edges leaving the path from the root to v, or 
many of the ancestors of v have exactly one child. In the first case, we will 
be forced to consider whether the majority of edges leaving the root-to- v 
path lead to nodes which are lexicographically before, or after, v. To do 
so, we use lexicographic and reverse-lexicographic depth-first search (DFS) 
of %■ 

To define lexicographic DFS of T n , think of T n as a plane tree (i.e. as em- 
bedded in the Ulam-Harris tree IX) and list the nodes of T n in lexicographic 
order as v ,vi,. . . ,u n _i. We then let Q$ = 1 and Qf = Qf_ 1 - 1 + £ Vi _ i: 
where £ Vi is the number of children of Vi in T n - (This is sometimes called the 
Lukasiewicz path of T n ', see, e.g., The reverse-lexicographic depth-first 

search of T n is the sequence Qq, . . . , QJj- n \ obtained by performing a lexico- 
graphic depth-first search on the mirror image of T n (so if the root has 
children 1, . . . , k in T n , then k is the first rather than last child visited, and 
so on). We remark that the lexicographic and reverse- lexicographic depth- 
first search both are identical in distribution to the breadth-first search of 

Now let pi = P(£ = 1) and let q\ = 1— pi. If v is a node of T n with h(v) = 
h, then, writing j (resp. k) for the index of v in lexicographic (resp. reverse- 
lexicographic) order, either m&x(Qj,Q r k ) > (q\/3)h, or else at least {p\ + 
qi/3)h of the ancestors of v have exactly one child. Let S be the set of trees 
T with \T\ = n, such that T contains a node v possessing {p\ + q\/3>)h{v) 
ancestors with exactly one child and for which h(v) = h. Then let £ := 
{ Tn£ S} = U TeS {T n = T}. 

Since Q d and Q r have the same distribution as Q, we then have 

HH(T n ) >h)< P(maxQf > (qi/3)h) + P(maxQ£ > ( qi /S)h) +¥(£) 

3 k 

= 2P(maxQi > ( qi /3)h) +¥(£) 

i 

< C n e- Cllh2/n + ¥(£), (4.5) 

the latter inequality holding due to (|4.4j) . 

Next, for each tree T G S, fix a path 'jt from the root of T to a node 
v with h(v) = h and with at least {p\ + q\/3)h ancestors with exactly one 
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child (such a node exists by the definition of S). Then by (13.2 

P(T£5) = ^P(T = T) 

res 

= ^ P(f (h) = T with 7 T as spine) 



P ( |J {f {h) = T with 7T as spine} 
\res , 



<h-\ 



a=0 



The 1|. =1 are Bernoulli(pi), so by Lemma |2. 3 



^2 %=i - fa + W 3 )^j < ex P 



(gi/i/3) : 



2piqih + 2q(h/9 



It follows by (JMD and l|O j) -l|377 p that 

'(TeS) ^ _ 3 /2„„/ 



P(|T| = n) v 



18pi/ff! + 2 



-ci 2 h 2 /n 



< C 13 e 

for all /i > y^n. Together with (|4.5p we have thus proved 

P(^(T„) > /i) < C n e- Cllh2/n + C 13 e- Cl2h2/n , 
which establishes (|1.4|) , □ 

Proof of Theorem \1.5[ Note first that the case k > n is trivial, since H(7~ n ) < 
n and Zk(Tn) = for k > n. Further, if k < y/n, then the result follows 
from (11. 8h , Hence it suffices to consider y/n < k < n. 

Consider the random tree T"W constructed in Section [3l By the alterna- 
tive construction described there, we can regard the tree as the k mutant 
nodes (the spine except its top node) together with a random number M of 
attached independent copies of T. Hence, 

M 

\f^\ = k + J2\U (4.8) 

i=l 

where T% are independent copies of T, independent also of M. The number 
M is the total number of normal children (including the top node) of the k 
mutants, and thus 

k 

M = - 1) + 1, (4-9) 

i=l 
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where £j are i.i.d. with the distribution (13. ip . 

Thus, for m > and n > k, using (|4.8p . (12 .3D and Lemma l2~7lj 

p(|f( fc )| = n | M = m) = \Ti\=n-kj= -^-r F(S n - k = n-k-m) 

i=l 

- Cl {n-k)^ e ■ (4-10) 

The summands & — 1 in (|4.9p have mean E(£ — 1) = E £ 2 - 1 = a 2 > 0. We 
truncate them and define ^ := min(£j, K), where K is chosen so large that 
> 1 + cr 2 /2. We apply Bernstein's inequality (|2.6p to — ^, and obtain, 
since Var(£-) < oo and thus V = 0(n), 

P(M < A:c7 2 /4) < P - 1) < < P - 1) < 



\i=l / \i=l 

< P ^(g - E£[) < -&<x 2 /4j < e- ci3fe . 



(4.11) 



Note that |T (fc) | > M + k by (|3~B]) . so if M = m > ka 2 /2, we only have 
to consider n > m + k > (1 + a 2 /2)k, and for such n, n — k > c\Aji. Hence, 
for m > ko 2 /2, (jlTO) yields 

P(|f (fc) | = n I M = m) < C 14 ^e- C7m2/n < C^-e~ c ^ m ^ n . (4.12) 

If \fn < k < n, (I4TT]) and (l4T2j) yield 

P(|f = n)< e~ Cl3k + max Ci 5 - e - Cl5m2/n < C w -e~ Cl(ik2/n . (4.13) 

V 7 m>kcr 2 /2 71 Tl 

Since P(|T| = n) > c 17 n~ 3 / 2 by ([23]), ([331) a nd Iffl) yield, if \fn < k < 

n, 

EZ fc (T„) < C 17 n 1/2 e- Ciek2/n < C 18 ke- Cl6k ' ' /n , (4.14) 

which completes the proof. (We remarked above that it suffices to consider 
such k.) □ 



Proof of Theorem \1.6[ First, by Theorem 11.11 

F(Z k (T n ) >x)< F(W(T n ) >x)< de"^ 2 /". 

Further, since Z k (J~ n ) > implies H(T n ) > k, Theorem 11.21 implies that 
F(Z k (T n ) > x) < ¥(H(T n ) >k)< C 2 e- C2fc2 / n . 

Taking the geometric mean of these bounds we obtain (|1.13p . Further, (|1.13p 
implies, for any r > 0, with Z := Z k (J~ n ) / 'y/n, 

poo poo 

EZ r = r / x r ~ x F(Z >x)dx< rC 7 e~ C6k /n / x r - 1 e~ C7X dx 

J J 

= Ci 9 (r)e^ C6fc2 / n . □ 
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